---
title: 'Add Data Sources'
description: 'Learn how to ingest data from various sources in PandasAI'
---

<Note title="Beta Notice">
Release v3 is currently in beta. This documentation reflects the features and functionality in progress and may change before the final release.
</Note>


## What type of data does PandasAI support?
PandasAI mission is to make data analysis and manipulation more efficient and accessible to everyone. You can work with data in various ways:

- **CSV and Excel Files**: Load data directly from files using simple Python functions
- **SQL Databases**: Connect to various SQL databases using our extensions
- **Cloud Data**: Work with enterprise-scale data using our specialized extensions (requires Enterprise License)

Let's start with the basics of loading CSV files, and then we'll explore the different extensions available.


## How to work with CSV files in PandasAI?

Loading data from CSV files is straightforward with PandasAI:

```python
import pandasai as pai

# Basic CSV loading
file = pai.read_csv("data.csv")

# Use the semantic layer on CSV
df = pai.create(
    path="company/sales-data",
    df = file,
    description="Sales data from our retail stores",
    columns={
        "transaction_id": {"type": "string", "description": "Unique identifier for each sale"},
        "sale_date": {"type": "datetime", "description": "Date and time of the sale"},
        "product_id": {"type": "string", "description": "Product identifier"},
        "quantity": {"type": "integer", "description": "Number of units sold"},
        "price": {"type": "float", "description": "Price per unit"}
    },
)

# Chat with the dataframe
response = df.chat("Which product has the highest sales?")

```

## How to work with SQL in PandasAI?

PandasAI provides a sql extension for you to work with SQL, PostgreSQL, MySQL, and CockroachDB databases.
To make the library lightweight and easy to use, the basic installation of the library does not include this extension.
It can be easily installed using pip with the specific database you want to use:

```bash
pip install pandasai-sql[postgres]
pip install pandasai-sql[mysql]
pip install pandasai-sql[cockroachdb]
```

Once you have installed the extension, you can use the [semantic data layer](/v3/semantic-layer#for-sql-databases-using-the-create-method) and perform [data transformations](/docs/v3/transformations).

```python
sql_table = pai.create(
    path="example/mysql-dataset",
    description="Heart disease dataset from MySQL database",
    source={
        "type": "mysql",
        "connection": {
            "host": "database.example.com",
            "port": 3306,
            "user": "${DB_USER}",
            "password": "${DB_PASSWORD}",
            "database": "medical_data"
        },
        "table": "heart_data",
        "columns": [
            {"name": "Age", "type": "integer", "description": "Age of the patient in years"},
            {"name": "Sex", "type": "string", "description": "Gender of the patient (M = male, F = female)"},
            {"name": "ChestPainType", "type": "string", "description": "Type of chest pain (ATA, NAP, ASY, TA)"},
            {"name": "RestingBP", "type": "integer", "description": "Resting blood pressure in mm Hg"},
            {"name": "Cholesterol", "type": "integer", "description": "Serum cholesterol in mg/dl"},
            {"name": "FastingBS", "type": "integer", "description": "Fasting blood sugar > 120 mg/dl (1 = true, 0 = false)"},
            {"name": "RestingECG", "type": "string", "description": "Resting electrocardiogram results (Normal, ST, LVH)"},
            {"name": "MaxHR", "type": "integer", "description": "Maximum heart rate achieved"},
            {"name": "ExerciseAngina", "type": "string", "description": "Exercise-induced angina (Y = yes, N = no)"},
            {"name": "Oldpeak", "type": "float", "description": "ST depression induced by exercise relative to rest"},
            {"name": "ST_Slope", "type": "string", "description": "Slope of the peak exercise ST segment (Up, Flat, Down)"},
            {"name": "HeartDisease", "type": "integer", "description": "Heart disease diagnosis (1 = present, 0 = absent)"}
        ]
    }
)
```

## How to work with Enterprise Cloud Data in PandasAI?

PandasAI provides Enterprise Edition extensions for connecting to cloud data. These extensions require an Enterprise License or [Data Platform](/v3/ai-dashboards) team plan.
Once you have installed a enterprise cloud data extension, you can use it to connect to your cloud data.

### Snowflake extension (ee)

First, install the extension:
```bash
poetry add pandasai-snowflake
# or
pip install pandasai-snowflake
```

Then use it:
```yaml
name: sales_data

source:
  type: snowflake
  connection:
    account: your-account
    warehouse: your-warehouse
    database: your-database
    schema: your-schema
    user: ${SNOWFLAKE_USER}
    password: ${SNOWFLAKE_PASSWORD}
  table: sales_data

destination:
  type: local
  format: parquet
  path: company/snowflake-sales

columns:
  - name: transaction_id
    type: string
    description: Unique identifier for each sale
  - name: sale_date
    type: datetime
    description: Date and time of the sale
  - name: product_id
    type: string
    description: Product identifier
  - name: quantity
    type: integer
    description: Number of units sold
  - name: price
    type: float
    description: Price per unit

transformations:
  - type: convert_timezone
    params:
      column: sale_date
      from: UTC
      to: America/Chicago
  - type: calculate
    params:
      column: revenue
      formula: quantity * price
  - type: round
    params:
      column: revenue
      decimals: 2

update_frequency: daily

order_by:
  - sale_date DESC

limit: 100000
```

### Databricks extension (ee)

First, install the extension:
```bash
poetry add pandasai-databricks
# or
pip install pandasai-databricks
```

Then use it:
```yaml
name: customer_data

source:
  type: databricks
  connection:
    host: your-workspace-url
    token: ${DATABRICKS_TOKEN}
  table: customers

destination:
  type: local
  format: parquet
  path: company/databricks-customers

columns:
  - name: customer_id
    type: string
    description: Unique identifier for each customer
  - name: name
    type: string
    description: Customer's full name
  - name: email
    type: string
    description: Customer's email address
  - name: join_date
    type: datetime
    description: Date when customer joined
  - name: total_purchases
    type: integer
    description: Total number of purchases made

transformations:
  - type: anonymize
    params:
      columns: [email, name]
  - type: convert_timezone
    params:
      column: join_date
      from: UTC
      to: Europe/London
  - type: calculate
    params:
      column: customer_tier
      formula: "CASE WHEN total_purchases > 100 THEN 'Gold' WHEN total_purchases > 50 THEN 'Silver' ELSE 'Bronze' END"

update_frequency: daily

order_by:
  - join_date DESC

limit: 100000
```

### BigQuery extension (ee)

First, install the extension:
```bash
poetry add pandasai-bigquery
# or
pip install pandasai-bigquery
```

Then use it:
```yaml
name: inventory_data

source:
  type: bigquery
  connection:
    project_id: your-project-id
    credentials: ${GOOGLE_APPLICATION_CREDENTIALS}
  table: inventory

destination:
  type: local
  format: parquet
  path: company/bigquery-inventory

columns:
  - name: product_id
    type: string
    description: Unique identifier for each product
  - name: product_name
    type: string
    description: Name of the product
  - name: category
    type: string
    description: Product category
  - name: stock_level
    type: integer
    description: Current quantity in stock
  - name: last_updated
    type: datetime
    description: Last inventory update timestamp

transformations:
  - type: categorize
    params:
      column: stock_level
      bins: [0, 20, 100, 500]
      labels: ["Low", "Medium", "High"]
  - type: extract
    params:
      column: product_name
      pattern: "(.*?)\\s*-\\s*(.*)"
      into: [brand, model]
  - type: convert_timezone
    params:
      column: last_updated
      from: UTC
      to: Asia/Tokyo

update_frequency: hourly

order_by:
  - last_updated DESC

limit: 50000
```

### Oracle extension (ee)

First, install the extension:
```bash
poetry add pandasai-oracle
# or
pip install pandasai-oracle
```

Then use it:
```yaml
name: sales_data

source:
  type: oracle
  connection:
    host: your-host
    port: 1521
    service_name: your-service
    user: ${ORACLE_USER}
    password: ${ORACLE_PASSWORD}
  table: sales_data

destination:
  type: local
  format: parquet
  path: company/oracle-sales

columns:
  - name: transaction_id
    type: string
    description: Unique identifier for each sale
  - name: sale_date
    type: datetime
    description: Date and time of the sale
  - name: product_id
    type: string
    description: Product identifier
  - name: quantity
    type: integer
    description: Number of units sold
  - name: price
    type: float
    description: Price per unit

transformations:
  - type: convert_timezone
    params:
      column: sale_date
      from: UTC
      to: Australia/Sydney
  - type: calculate
    params:
      column: total_amount
      formula: quantity * price
  - type: round
    params:
      column: total_amount
      decimals: 2
  - type: calculate
    params:
      column: discount
      formula: "CASE WHEN quantity > 10 THEN 0.1 WHEN quantity > 5 THEN 0.05 ELSE 0 END"

update_frequency: daily

order_by:
  - sale_date DESC

limit: 100000
```

### Yahoo Finance extension

First, install the extension:
```bash
poetry add pandasai-yfinance
# or
pip install pandasai-yfinance
```

Then use it:
```yaml
name: stock_data

source:
  type: yahoo_finance
  symbols: 
    - GOOG
    - MSFT
    - AAPL
  start_date: 2023-01-01
  end_date: 2023-12-31

destination:
  type: local
  format: parquet
  path: company/market-data

columns:
  - name: date
    type: datetime
    description: Date of the trading day
  - name: open
    type: float
    description: Opening price of the stock
  - name: high
    type: float
    description: Highest price of the stock during the day
  - name: low
    type: float
    description: Lowest price of the stock during the day
  - name: close
    type: float
    description: Closing price of the stock
  - name: volume
    type: integer
    description: Number of shares traded during the day

transformations:
  - type: calculate
    params:
      column: daily_return
      formula: (close - open) / open * 100
  - type: calculate
    params:
      column: price_range
      formula: high - low
  - type: round
    params:
      columns: [daily_return, price_range]
      decimals: 2
  - type: convert_timezone
    params:
      column: date
      from: UTC
      to: America/New_York

update_frequency: daily

order_by:
  - date DESC

limit: 100000
```

## All data extensions

  <table style={{ borderCollapse: 'collapse', width: '100%', border: '1px solid #ccc' }}>
<tr>
  <th style={{ border: '1px solid #ccc', padding: '8px 16px', textAlign: 'left' }}>extension</th>
  <th style={{ border: '1px solid #ccc', padding: '8px 16px', textAlign: 'left' }}>install with poetry</th>
  <th style={{ border: '1px solid #ccc', padding: '8px 16px', textAlign: 'left' }}>install with pip</th>
  <th style={{ border: '1px solid #ccc', padding: '8px 16px', textAlign: 'left' }}>need ee license?</th>
</tr>
<tr>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>pandasai_sql</td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>poetry add pandasai-sql[postgres]</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>pip install pandasai-sql[postgres]</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>No</td>
</tr>
<tr>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>pandasai_yfinance</td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>poetry add pandasai-yfinance</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>pip install pandasai-yfinance</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>No</td>
</tr>
<tr>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>pandasai_snowflake</td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>poetry add pandasai-snowflake</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>pip install pandasai-snowflake</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>Yes</td>
</tr>
<tr>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>pandasai_databricks</td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>poetry add pandasai-databricks</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>pip install pandasai-databricks</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>Yes</td>
</tr>
<tr>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>pandasai_bigquery</td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>poetry add pandasai-bigquery</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>pip install pandasai-bigquery</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>Yes</td>
</tr>
<tr>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>pandasai_oracle</td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>poetry add pandasai-oracle</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}><code>pip install pandasai-oracle</code></td>
  <td style={{ border: '1px solid #ccc', padding: '8px 16px' }}>Yes</td>
</tr>
</table>